Overview

Dataset statistics

Number of variables40
Number of observations292
Missing cells774
Missing cells (%)6.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory410.3 KiB
Average record size in memory1.4 KiB

Variable types

Categorical27
DateTime3
Numeric8
Unsupported2

Warnings

TP_NOT has constant value "2" Constant
ID_AGRAVO has constant value "B54" Constant
NU_ANO has constant value "2012" Constant
SG_UF has constant value "33" Constant
ID_RG_RESI has constant value "" Constant
ID_PAIS has constant value "1" Constant
ID_OCUPA_N has a high cardinality: 120 distinct values High cardinality
LOC_INF has a high cardinality: 52 distinct values High cardinality
DEXAME has a high cardinality: 169 distinct values High cardinality
DTRATA has a high cardinality: 98 distinct values High cardinality
SEM_NOT is highly correlated with SEM_PRIHigh correlation
SG_UF_NOT is highly correlated with ID_MUNICIPHigh correlation
ID_MUNICIP is highly correlated with SG_UF_NOTHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
SEM_NOT is highly correlated with SEM_PRIHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
CLASSI_FIN is highly correlated with COPAISINFHigh correlation
COPAISINF is highly correlated with CLASSI_FINHigh correlation
SEM_NOT is highly correlated with SEM_PRIHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
CLASSI_FIN is highly correlated with COPAISINFHigh correlation
COPAISINF is highly correlated with CLASSI_FINHigh correlation
COUFINF is highly correlated with PMM and 10 other fieldsHigh correlation
PMM is highly correlated with COUFINF and 10 other fieldsHigh correlation
CS_RACA is highly correlated with PMM and 5 other fieldsHigh correlation
RESULT is highly correlated with COUFINF and 12 other fieldsHigh correlation
AT_SINTOMA is highly correlated with PMM and 5 other fieldsHigh correlation
ID_UNIDADE is highly correlated with ID_REGIONA and 2 other fieldsHigh correlation
ID_REGIONA is highly correlated with PMM and 4 other fieldsHigh correlation
SG_UF_NOT is highly correlated with PMM and 4 other fieldsHigh correlation
SEM_NOT is highly correlated with SEM_PRIHigh correlation
DTRATA is highly correlated with COUFINF and 11 other fieldsHigh correlation
AT_LAMINA is highly correlated with COUFINF and 8 other fieldsHigh correlation
CS_ESCOL_N is highly correlated with PMM and 7 other fieldsHigh correlation
ID_MUNICIP is highly correlated with ID_UNIDADE and 3 other fieldsHigh correlation
NU_IDADE_N is highly correlated with CS_ESCOL_NHigh correlation
COMUNINF is highly correlated with COUFINF and 13 other fieldsHigh correlation
CLASSI_FIN is highly correlated with COUFINF and 12 other fieldsHigh correlation
LOC_INF is highly correlated with COUFINF and 13 other fieldsHigh correlation
COPAISINF is highly correlated with RESULT and 6 other fieldsHigh correlation
DSTRAESQUE is highly correlated with RESULT and 9 other fieldsHigh correlation
TPAUTOCTO is highly correlated with COUFINF and 9 other fieldsHigh correlation
CS_GESTANT is highly correlated with CS_SEXOHigh correlation
TRA_ESQUEM is highly correlated with COUFINF and 9 other fieldsHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
AT_ATIVIDA is highly correlated with COUFINF and 7 other fieldsHigh correlation
CS_SEXO is highly correlated with CS_GESTANTHigh correlation
PCRUZ is highly correlated with COUFINF and 11 other fieldsHigh correlation
ID_MN_RESI is highly correlated with ID_REGIONA and 5 other fieldsHigh correlation
COUFINF is highly correlated with DTRATA and 9 other fieldsHigh correlation
ID_REGIONA is highly correlated with ID_PAIS and 6 other fieldsHigh correlation
DTRATA is highly correlated with COUFINF and 13 other fieldsHigh correlation
CS_ESCOL_N is highly correlated with ID_PAIS and 5 other fieldsHigh correlation
DSTRAESQUE is highly correlated with DTRATA and 7 other fieldsHigh correlation
ID_PAIS is highly correlated with COUFINF and 23 other fieldsHigh correlation
NU_ANO is highly correlated with COUFINF and 23 other fieldsHigh correlation
CS_SEXO is highly correlated with ID_PAIS and 6 other fieldsHigh correlation
LOC_INF is highly correlated with COUFINF and 10 other fieldsHigh correlation
SG_UF is highly correlated with COUFINF and 23 other fieldsHigh correlation
CS_RACA is highly correlated with ID_PAIS and 5 other fieldsHigh correlation
RESULT is highly correlated with DTRATA and 12 other fieldsHigh correlation
AT_SINTOMA is highly correlated with ID_PAIS and 9 other fieldsHigh correlation
SG_UF_NOT is highly correlated with ID_REGIONA and 6 other fieldsHigh correlation
TP_NOT is highly correlated with COUFINF and 23 other fieldsHigh correlation
AT_LAMINA is highly correlated with ID_PAIS and 9 other fieldsHigh correlation
COMUNINF is highly correlated with COUFINF and 9 other fieldsHigh correlation
TPAUTOCTO is highly correlated with COUFINF and 13 other fieldsHigh correlation
ID_AGRAVO is highly correlated with COUFINF and 23 other fieldsHigh correlation
CS_GESTANT is highly correlated with ID_PAIS and 6 other fieldsHigh correlation
ID_RG_RESI is highly correlated with COUFINF and 23 other fieldsHigh correlation
TRA_ESQUEM is highly correlated with DTRATA and 9 other fieldsHigh correlation
AT_ATIVIDA is highly correlated with ID_PAIS and 8 other fieldsHigh correlation
CLASSI_FIN is highly correlated with ID_PAIS and 12 other fieldsHigh correlation
PCRUZ is highly correlated with DTRATA and 9 other fieldsHigh correlation
DT_NASC has 15 (5.1%) missing values Missing
DT_INVEST has 292 (100.0%) missing values Missing
PMM has 175 (59.9%) missing values Missing
DT_ENCERRA has 292 (100.0%) missing values Missing
DEXAME is uniformly distributed Uniform
DT_INVEST is an unsupported type, check if it needs cleaning or further analysis Unsupported
DT_ENCERRA is an unsupported type, check if it needs cleaning or further analysis Unsupported
COPAISINF has 166 (56.8%) zeros Zeros

Reproduction

Analysis started2021-07-06 18:41:25.727079
Analysis finished2021-07-06 18:41:47.178064
Duration21.45 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

TP_NOT
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size16.7 KiB
2
292 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters292
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2292
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2292
100.0%

Most occurring characters

ValueCountFrequency (%)
2292
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number292
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2292
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common292
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2292
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII292
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2292
100.0%

ID_AGRAVO
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size21.8 KiB
B54
292 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters876
Distinct characters3
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB54
2nd rowB54
3rd rowB54
4th rowB54
5th rowB54

Common Values

ValueCountFrequency (%)
B54292
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
b54292
100.0%

Most occurring characters

ValueCountFrequency (%)
B292
33.3%
5292
33.3%
4292
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number584
66.7%
Uppercase Letter292
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5292
50.0%
4292
50.0%
Uppercase Letter
ValueCountFrequency (%)
B292
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common584
66.7%
Latin292
33.3%

Most frequent character per script

Common
ValueCountFrequency (%)
5292
50.0%
4292
50.0%
Latin
ValueCountFrequency (%)
B292
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII876
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
B292
33.3%
5292
33.3%
4292
33.3%
Distinct169
Distinct (%)57.9%
Missing0
Missing (%)0.0%
Memory size2.4 KiB
Minimum2012-01-05 00:00:00
Maximum2012-12-30 00:00:00
Histogram with fixed size bins (bins=50)

SEM_NOT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct64
Distinct (%)21.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean186159.5445
Minimum1210
Maximum201252
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.4 KiB

Quantile statistics

Minimum1210
5-th percentile1242.55
Q1201213
median201224
Q3201236.25
95-th percentile201251
Maximum201252
Range200042
Interquartile range (IQR)23.25

Descriptive statistics

Standard deviation52875.54693
Coefficient of variation (CV)0.2840334997
Kurtosis8.519876282
Mean186159.5445
Median Absolute Deviation (MAD)11.5
Skewness-3.234434349
Sum54358587
Variance2795823463
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20122814
 
4.8%
20125113
 
4.5%
20122212
 
4.1%
20122511
 
3.8%
20122710
 
3.4%
2012248
 
2.7%
2012157
 
2.4%
2012217
 
2.4%
2012307
 
2.4%
2012407
 
2.4%
Other values (54)196
67.1%
ValueCountFrequency (%)
12101
 
0.3%
12231
 
0.3%
12254
1.4%
12321
 
0.3%
12331
 
0.3%
12351
 
0.3%
12404
1.4%
12411
 
0.3%
12421
 
0.3%
12431
 
0.3%
ValueCountFrequency (%)
2012525
 
1.7%
20125113
4.5%
2012506
2.1%
2012497
2.4%
2012487
2.4%
2012472
 
0.7%
2012462
 
0.7%
2012452
 
0.7%
2012436
2.1%
2012414
 
1.4%

NU_ANO
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size17.5 KiB
2012
292 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters1168
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2012
2nd row2012
3rd row2012
4th row2012
5th row2012

Common Values

ValueCountFrequency (%)
2012292
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2012292
100.0%

Most occurring characters

ValueCountFrequency (%)
2584
50.0%
0292
25.0%
1292
25.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1168
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2584
50.0%
0292
25.0%
1292
25.0%

Most occurring scripts

ValueCountFrequency (%)
Common1168
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2584
50.0%
0292
25.0%
1292
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1168
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2584
50.0%
0292
25.0%
1292
25.0%

SG_UF_NOT
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size16.9 KiB
33
286 
31
 
3
42
 
1
32
 
1
35
 
1

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters584
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)1.0%

Sample

1st row33
2nd row33
3rd row33
4th row33
5th row33

Common Values

ValueCountFrequency (%)
33286
97.9%
313
 
1.0%
421
 
0.3%
321
 
0.3%
351
 
0.3%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
33286
97.9%
313
 
1.0%
351
 
0.3%
321
 
0.3%
421
 
0.3%

Most occurring characters

ValueCountFrequency (%)
3577
98.8%
13
 
0.5%
22
 
0.3%
51
 
0.2%
41
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number584
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3577
98.8%
13
 
0.5%
22
 
0.3%
51
 
0.2%
41
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Common584
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3577
98.8%
13
 
0.5%
22
 
0.3%
51
 
0.2%
41
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII584
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3577
98.8%
13
 
0.5%
22
 
0.3%
51
 
0.2%
41
 
0.2%

ID_MUNICIP
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct18
Distinct (%)6.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean330598.6301
Minimum310620
Maximum420820
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.4 KiB

Quantile statistics

Minimum310620
5-th percentile330330
Q1330455
median330455
Q3330455
95-th percentile330455
Maximum420820
Range110200
Interquartile range (IQR)0

Descriptive statistics

Standard deviation5874.901082
Coefficient of variation (CV)0.01777049433
Kurtosis194.5736381
Mean330598.6301
Median Absolute Deviation (MAD)0
Skewness12.35510614
Sum96534800
Variance34514462.72
MonotonicityNot monotonic
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
330455259
88.7%
3304609
 
3.1%
3303304
 
1.4%
3106203
 
1.0%
3301702
 
0.7%
3300202
 
0.7%
3304202
 
0.7%
3550301
 
0.3%
3205301
 
0.3%
3303501
 
0.3%
Other values (8)8
 
2.7%
ValueCountFrequency (%)
3106203
1.0%
3205301
 
0.3%
3300101
 
0.3%
3300202
0.7%
3300801
 
0.3%
3301702
0.7%
3301851
 
0.3%
3301901
 
0.3%
3302401
 
0.3%
3303304
1.4%
ValueCountFrequency (%)
4208201
 
0.3%
3550301
 
0.3%
3306201
 
0.3%
3304609
 
3.1%
330455259
88.7%
3304202
 
0.7%
3303601
 
0.3%
3303501
 
0.3%
3303304
 
1.4%
3302401
 
0.3%

ID_REGIONA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size17.6 KiB
286 
1449
 
3
1331
 
1
1550
 
1
1510
 
1

Length

Max length4
Median length0
Mean length0.08219178082
Min length0

Characters and Unicode

Total characters24
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)1.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
286
97.9%
14493
 
1.0%
13311
 
0.3%
15501
 
0.3%
15101
 
0.3%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
14493
50.0%
13311
 
16.7%
15501
 
16.7%
15101
 
16.7%

Most occurring characters

ValueCountFrequency (%)
18
33.3%
46
25.0%
93
 
12.5%
53
 
12.5%
32
 
8.3%
02
 
8.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number24
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
18
33.3%
46
25.0%
93
 
12.5%
53
 
12.5%
32
 
8.3%
02
 
8.3%

Most occurring scripts

ValueCountFrequency (%)
Common24
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
18
33.3%
46
25.0%
93
 
12.5%
53
 
12.5%
32
 
8.3%
02
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII24
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
18
33.3%
46
25.0%
93
 
12.5%
53
 
12.5%
32
 
8.3%
02
 
8.3%

ID_UNIDADE
Real number (ℝ≥0)

HIGH CORRELATION

Distinct52
Distinct (%)17.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2883335.914
Minimum18
Maximum6771025
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.4 KiB

Quantile statistics

Minimum18
5-th percentile2268507
Q12270471
median2288338
Q33375471
95-th percentile6511875.45
Maximum6771025
Range6771007
Interquartile range (IQR)1105000

Descriptive statistics

Standard deviation1251205.041
Coefficient of variation (CV)0.4339435563
Kurtosis3.884213494
Mean2883335.914
Median Absolute Deviation (MAD)420015
Skewness1.589839908
Sum841934087
Variance1.565514054 × 1012
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
227047194
32.2%
337547152
17.8%
270835332
 
11.0%
228833818
 
6.2%
675346911
 
3.8%
30059929
 
3.1%
22685078
 
2.7%
30034506
 
2.1%
54569325
 
1.7%
270493
 
1.0%
Other values (42)54
18.5%
ValueCountFrequency (%)
181
 
0.3%
631
 
0.3%
651
 
0.3%
123781
 
0.3%
125051
 
0.3%
125131
 
0.3%
270493
 
1.0%
20288401
 
0.3%
22684181
 
0.3%
22685078
2.7%
ValueCountFrequency (%)
67710251
 
0.3%
675346911
 
3.8%
67340141
 
0.3%
67169382
 
0.7%
63440973
 
1.0%
60439412
 
0.7%
54569325
 
1.7%
337547152
17.8%
33483342
 
0.7%
33338682
 
0.7%
Distinct189
Distinct (%)64.7%
Missing0
Missing (%)0.0%
Memory size2.4 KiB
Minimum2011-12-23 00:00:00
Maximum2012-12-29 00:00:00
Histogram with fixed size bins (bins=50)

SEM_PRI
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct69
Distinct (%)23.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean186157.7089
Minimum1210
Maximum201252
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.4 KiB

Quantile statistics

Minimum1210
5-th percentile1242
Q1201212
median201224
Q3201235
95-th percentile201250
Maximum201252
Range200042
Interquartile range (IQR)23

Descriptive statistics

Standard deviation52876.58182
Coefficient of variation (CV)0.2840418596
Kurtosis8.519875557
Mean186157.7089
Median Absolute Deviation (MAD)12
Skewness-3.23443423
Sum54358051
Variance2795932905
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20125013
 
4.5%
20122713
 
4.5%
20122412
 
4.1%
20122211
 
3.8%
20122811
 
3.8%
20121210
 
3.4%
2012149
 
3.1%
2012309
 
3.1%
2012208
 
2.7%
2012408
 
2.7%
Other values (59)188
64.4%
ValueCountFrequency (%)
12101
 
0.3%
12181
 
0.3%
12221
 
0.3%
12232
0.7%
12241
 
0.3%
12301
 
0.3%
12321
 
0.3%
12341
 
0.3%
12391
 
0.3%
12403
1.0%
ValueCountFrequency (%)
2012521
 
0.3%
2012515
 
1.7%
20125013
4.5%
2012494
 
1.4%
2012487
2.4%
2012474
 
1.4%
2012462
 
0.7%
2012451
 
0.3%
2012443
 
1.0%
2012435
 
1.7%

DT_NASC
Date

MISSING

Distinct260
Distinct (%)93.9%
Missing15
Missing (%)5.1%
Memory size2.4 KiB
Minimum1928-02-16 00:00:00
Maximum2012-05-27 00:00:00
Histogram with fixed size bins (bins=50)

NU_IDADE_N
Real number (ℝ≥0)

HIGH CORRELATION

Distinct65
Distinct (%)22.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4016.39726
Minimum2003
Maximum4084
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.4 KiB

Quantile statistics

Minimum2003
5-th percentile4015.55
Q14028
median4036
Q34045.25
95-th percentile4063.45
Maximum4084
Range2081
Interquartile range (IQR)17.25

Descriptive statistics

Standard deviation169.278943
Coefficient of variation (CV)0.04214696207
Kurtosis85.1451916
Mean4016.39726
Median Absolute Deviation (MAD)8
Skewness-8.780314524
Sum1172788
Variance28655.36054
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
403113
 
4.5%
403213
 
4.5%
403712
 
4.1%
403312
 
4.1%
404011
 
3.8%
402811
 
3.8%
402610
 
3.4%
403010
 
3.4%
404210
 
3.4%
40369
 
3.1%
Other values (55)181
62.0%
ValueCountFrequency (%)
20031
0.3%
30011
0.3%
30031
0.3%
30041
0.3%
30071
0.3%
40012
0.7%
40022
0.7%
40051
0.3%
40062
0.7%
40111
0.3%
ValueCountFrequency (%)
40841
 
0.3%
40781
 
0.3%
40742
0.7%
40731
 
0.3%
40711
 
0.3%
40702
0.7%
40674
1.4%
40661
 
0.3%
40642
0.7%
40631
 
0.3%

CS_SEXO
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size18.9 KiB
M
217 
F
75 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters292
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowF
3rd rowM
4th rowM
5th rowM

Common Values

ValueCountFrequency (%)
M217
74.3%
F75
 
25.7%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
m217
74.3%
f75
 
25.7%

Most occurring characters

ValueCountFrequency (%)
M217
74.3%
F75
 
25.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter292
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M217
74.3%
F75
 
25.7%

Most occurring scripts

ValueCountFrequency (%)
Latin292
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
M217
74.3%
F75
 
25.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII292
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M217
74.3%
F75
 
25.7%

CS_GESTANT
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size16.7 KiB
6
224 
5
67 
4
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters292
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.3%

Sample

1st row6
2nd row5
3rd row6
4th row6
5th row6

Common Values

ValueCountFrequency (%)
6224
76.7%
567
 
22.9%
41
 
0.3%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
6224
76.7%
567
 
22.9%
41
 
0.3%

Most occurring characters

ValueCountFrequency (%)
6224
76.7%
567
 
22.9%
41
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number292
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
6224
76.7%
567
 
22.9%
41
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
Common292
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
6224
76.7%
567
 
22.9%
41
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII292
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6224
76.7%
567
 
22.9%
41
 
0.3%

CS_RACA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size18.9 KiB
1
217 
4
39 
2
 
21
9
 
14
 
1

Length

Max length1
Median length1
Mean length0.9965753425
Min length0

Characters and Unicode

Total characters291
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.3%

Sample

1st row1
2nd row1
3rd row4
4th row4
5th row4

Common Values

ValueCountFrequency (%)
1217
74.3%
439
 
13.4%
221
 
7.2%
914
 
4.8%
1
 
0.3%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1217
74.6%
439
 
13.4%
221
 
7.2%
914
 
4.8%

Most occurring characters

ValueCountFrequency (%)
1217
74.6%
439
 
13.4%
221
 
7.2%
914
 
4.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number291
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1217
74.6%
439
 
13.4%
221
 
7.2%
914
 
4.8%

Most occurring scripts

ValueCountFrequency (%)
Common291
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1217
74.6%
439
 
13.4%
221
 
7.2%
914
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII291
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1217
74.6%
439
 
13.4%
221
 
7.2%
914
 
4.8%

CS_ESCOL_N
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct11
Distinct (%)3.8%
Missing0
Missing (%)0.0%
Memory size17.0 KiB
08
159 
06
28 
09
27 
05
24 
07
19 
Other values (6)
35 

Length

Max length2
Median length2
Mean length1.965753425
Min length0

Characters and Unicode

Total characters574
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row08
2nd row04
3rd row04
4th row07
5th row08

Common Values

ValueCountFrequency (%)
08159
54.5%
0628
 
9.6%
0927
 
9.2%
0524
 
8.2%
0719
 
6.5%
1012
 
4.1%
049
 
3.1%
5
 
1.7%
034
 
1.4%
013
 
1.0%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
08159
55.4%
0628
 
9.8%
0927
 
9.4%
0524
 
8.4%
0719
 
6.6%
1012
 
4.2%
049
 
3.1%
034
 
1.4%
013
 
1.0%
022
 
0.7%

Most occurring characters

ValueCountFrequency (%)
0287
50.0%
8159
27.7%
628
 
4.9%
927
 
4.7%
524
 
4.2%
719
 
3.3%
115
 
2.6%
49
 
1.6%
34
 
0.7%
22
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number574
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0287
50.0%
8159
27.7%
628
 
4.9%
927
 
4.7%
524
 
4.2%
719
 
3.3%
115
 
2.6%
49
 
1.6%
34
 
0.7%
22
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
Common574
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0287
50.0%
8159
27.7%
628
 
4.9%
927
 
4.7%
524
 
4.2%
719
 
3.3%
115
 
2.6%
49
 
1.6%
34
 
0.7%
22
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII574
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0287
50.0%
8159
27.7%
628
 
4.9%
927
 
4.7%
524
 
4.2%
719
 
3.3%
115
 
2.6%
49
 
1.6%
34
 
0.7%
22
 
0.3%

SG_UF
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size16.9 KiB
33
292 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters584
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row33
2nd row33
3rd row33
4th row33
5th row33

Common Values

ValueCountFrequency (%)
33292
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
33292
100.0%

Most occurring characters

ValueCountFrequency (%)
3584
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number584
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3584
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common584
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3584
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII584
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3584
100.0%

ID_MN_RESI
Real number (ℝ≥0)

HIGH CORRELATION

Distinct25
Distinct (%)8.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean330410.3356
Minimum330010
Maximum330610
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.4 KiB

Quantile statistics

Minimum330010
5-th percentile330170
Q1330455
median330455
Q3330455
95-th percentile330455
Maximum330610
Range600
Interquartile range (IQR)0

Descriptive statistics

Standard deviation101.3043984
Coefficient of variation (CV)0.0003066017841
Kurtosis4.813212013
Mean330410.3356
Median Absolute Deviation (MAD)0
Skewness-2.265857815
Sum96479818
Variance10262.58113
MonotonicityNot monotonic
Histogram with fixed size bins (bins=25)
ValueCountFrequency (%)
330455210
71.9%
33033022
 
7.5%
3304609
 
3.1%
3301709
 
3.1%
3300404
 
1.4%
3302504
 
1.4%
3304204
 
1.4%
3302254
 
1.4%
3302404
 
1.4%
3303603
 
1.0%
Other values (15)19
 
6.5%
ValueCountFrequency (%)
3300102
 
0.7%
3300202
 
0.7%
3300404
1.4%
3300801
 
0.3%
3301709
3.1%
3301851
 
0.3%
3301901
 
0.3%
3302254
1.4%
3302404
1.4%
3302504
1.4%
ValueCountFrequency (%)
3306101
 
0.3%
3305601
 
0.3%
3305101
 
0.3%
3304901
 
0.3%
3304609
 
3.1%
330455210
71.9%
3304522
 
0.7%
3304204
 
1.4%
3304141
 
0.3%
3303603
 
1.0%

ID_RG_RESI
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size17.5 KiB
292 

Length

Max length0
Median length0
Mean length0
Min length0

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
292
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
No values found.

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

ID_PAIS
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size16.7 KiB
1
292 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters292
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1292
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1292
100.0%

Most occurring characters

ValueCountFrequency (%)
1292
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number292
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1292
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common292
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1292
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII292
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1292
100.0%

DT_INVEST
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing292
Missing (%)100.0%
Memory size2.4 KiB

ID_OCUPA_N
Categorical

HIGH CARDINALITY

Distinct120
Distinct (%)41.1%
Missing0
Missing (%)0.0%
Memory size17.9 KiB
83 
999991
18 
214205
17 
212405
 
6
252105
 
5
Other values (115)
163 

Length

Max length6
Median length6
Mean length4.294520548
Min length0

Characters and Unicode

Total characters1254
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique89 ?
Unique (%)30.5%

Sample

1st row810205
2nd row999991
3rd row233105
4th row999991
5th row212315

Common Values

ValueCountFrequency (%)
83
28.4%
99999118
 
6.2%
21420517
 
5.8%
2124056
 
2.1%
2521055
 
1.7%
2123155
 
1.7%
3511155
 
1.7%
2522104
 
1.4%
2410054
 
1.4%
2221054
 
1.4%
Other values (110)141
48.3%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
99999118
 
8.6%
21420517
 
8.1%
2124056
 
2.9%
2123155
 
2.4%
3511155
 
2.4%
2521055
 
2.4%
2124154
 
1.9%
2410054
 
1.9%
2211054
 
1.9%
2221054
 
1.9%
Other values (109)137
65.6%

Most occurring characters

ValueCountFrequency (%)
1273
21.8%
2266
21.2%
5191
15.2%
0174
13.9%
9120
9.6%
391
 
7.3%
482
 
6.5%
723
 
1.8%
818
 
1.4%
616
 
1.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1254
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1273
21.8%
2266
21.2%
5191
15.2%
0174
13.9%
9120
9.6%
391
 
7.3%
482
 
6.5%
723
 
1.8%
818
 
1.4%
616
 
1.3%

Most occurring scripts

ValueCountFrequency (%)
Common1254
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1273
21.8%
2266
21.2%
5191
15.2%
0174
13.9%
9120
9.6%
391
 
7.3%
482
 
6.5%
723
 
1.8%
818
 
1.4%
616
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII1254
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1273
21.8%
2266
21.2%
5191
15.2%
0174
13.9%
9120
9.6%
391
 
7.3%
482
 
6.5%
723
 
1.8%
818
 
1.4%
616
 
1.3%

CLASSI_FIN
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size16.7 KiB
2
158 
1
126 
8
 
8

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters292
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row1
4th row2
5th row1

Common Values

ValueCountFrequency (%)
2158
54.1%
1126
43.2%
88
 
2.7%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2158
54.1%
1126
43.2%
88
 
2.7%

Most occurring characters

ValueCountFrequency (%)
2158
54.1%
1126
43.2%
88
 
2.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number292
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2158
54.1%
1126
43.2%
88
 
2.7%

Most occurring scripts

ValueCountFrequency (%)
Common292
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2158
54.1%
1126
43.2%
88
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII292
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2158
54.1%
1126
43.2%
88
 
2.7%

AT_ATIVIDA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct12
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Memory size17.1 KiB
10
243 
11
 
19
9
 
8
 
7
99
 
6
Other values (7)
 
9

Length

Max length2
Median length2
Mean length1.897260274
Min length0

Characters and Unicode

Total characters554
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)1.7%

Sample

1st row11
2nd row10
3rd row10
4th row4
5th row9

Common Values

ValueCountFrequency (%)
10243
83.2%
1119
 
6.5%
98
 
2.7%
7
 
2.4%
996
 
2.1%
12
 
0.7%
72
 
0.7%
81
 
0.3%
41
 
0.3%
21
 
0.3%
Other values (2)2
 
0.7%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
10243
85.3%
1119
 
6.7%
98
 
2.8%
996
 
2.1%
12
 
0.7%
72
 
0.7%
81
 
0.4%
41
 
0.4%
21
 
0.4%
121
 
0.4%

Most occurring characters

ValueCountFrequency (%)
1284
51.3%
0243
43.9%
920
 
3.6%
72
 
0.4%
22
 
0.4%
41
 
0.2%
31
 
0.2%
81
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number554
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1284
51.3%
0243
43.9%
920
 
3.6%
72
 
0.4%
22
 
0.4%
41
 
0.2%
31
 
0.2%
81
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Common554
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1284
51.3%
0243
43.9%
920
 
3.6%
72
 
0.4%
22
 
0.4%
41
 
0.2%
31
 
0.2%
81
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII554
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1284
51.3%
0243
43.9%
920
 
3.6%
72
 
0.4%
22
 
0.4%
41
 
0.2%
31
 
0.2%
81
 
0.2%

AT_LAMINA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size18.9 KiB
2
244 
1
32 
3
 
9
 
7

Length

Max length1
Median length1
Mean length0.9760273973
Min length0

Characters and Unicode

Total characters285
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2244
83.6%
132
 
11.0%
39
 
3.1%
7
 
2.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2244
85.6%
132
 
11.2%
39
 
3.2%

Most occurring characters

ValueCountFrequency (%)
2244
85.6%
132
 
11.2%
39
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number285
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2244
85.6%
132
 
11.2%
39
 
3.2%

Most occurring scripts

ValueCountFrequency (%)
Common285
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2244
85.6%
132
 
11.2%
39
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII285
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2244
85.6%
132
 
11.2%
39
 
3.2%

AT_SINTOMA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size18.9 KiB
1
273 
2
 
12
 
7

Length

Max length1
Median length1
Mean length0.9760273973
Min length0

Characters and Unicode

Total characters285
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1273
93.5%
212
 
4.1%
7
 
2.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1273
95.8%
212
 
4.2%

Most occurring characters

ValueCountFrequency (%)
1273
95.8%
212
 
4.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number285
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1273
95.8%
212
 
4.2%

Most occurring scripts

ValueCountFrequency (%)
Common285
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1273
95.8%
212
 
4.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII285
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1273
95.8%
212
 
4.2%

TPAUTOCTO
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size18.1 KiB
165 
2
120 
1
 
6
3
 
1

Length

Max length1
Median length0
Mean length0.4349315068
Min length0

Characters and Unicode

Total characters127
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.3%

Sample

1st row2
2nd row
3rd row2
4th row
5th row2

Common Values

ValueCountFrequency (%)
165
56.5%
2120
41.1%
16
 
2.1%
31
 
0.3%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2120
94.5%
16
 
4.7%
31
 
0.8%

Most occurring characters

ValueCountFrequency (%)
2120
94.5%
16
 
4.7%
31
 
0.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number127
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2120
94.5%
16
 
4.7%
31
 
0.8%

Most occurring scripts

ValueCountFrequency (%)
Common127
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2120
94.5%
16
 
4.7%
31
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII127
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2120
94.5%
16
 
4.7%
31
 
0.8%

COUFINF
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct12
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Memory size17.4 KiB
229 
AM
 
14
PA
 
11
AP
 
10
RO
 
8
Other values (7)
 
20

Length

Max length2
Median length0
Mean length0.4315068493
Min length0

Characters and Unicode

Total characters126
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)1.0%

Sample

1st rowAP
2nd row
3rd rowAP
4th row
5th rowAP

Common Values

ValueCountFrequency (%)
229
78.4%
AM14
 
4.8%
PA11
 
3.8%
AP10
 
3.4%
RO8
 
2.7%
RJ8
 
2.7%
AC5
 
1.7%
MS2
 
0.7%
MT2
 
0.7%
RR1
 
0.3%
Other values (2)2
 
0.7%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
am14
22.2%
pa11
17.5%
ap10
15.9%
ro8
12.7%
rj8
12.7%
ac5
 
7.9%
ms2
 
3.2%
mt2
 
3.2%
rr1
 
1.6%
to1
 
1.6%

Most occurring characters

ValueCountFrequency (%)
A41
32.5%
P21
16.7%
M19
15.1%
R18
14.3%
O9
 
7.1%
J8
 
6.3%
C5
 
4.0%
T3
 
2.4%
S2
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter126
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A41
32.5%
P21
16.7%
M19
15.1%
R18
14.3%
O9
 
7.1%
J8
 
6.3%
C5
 
4.0%
T3
 
2.4%
S2
 
1.6%

Most occurring scripts

ValueCountFrequency (%)
Latin126
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A41
32.5%
P21
16.7%
M19
15.1%
R18
14.3%
O9
 
7.1%
J8
 
6.3%
C5
 
4.0%
T3
 
2.4%
S2
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII126
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A41
32.5%
P21
16.7%
M19
15.1%
R18
14.3%
O9
 
7.1%
J8
 
6.3%
C5
 
4.0%
T3
 
2.4%
S2
 
1.6%

COPAISINF
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct15
Distinct (%)5.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.53767123
Minimum0
Maximum188
Zeros166
Zeros (%)56.8%
Negative0
Negative (%)0.0%
Memory size2.4 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile130.7
Maximum188
Range188
Interquartile range (IQR)1

Descriptive statistics

Standard deviation37.96446999
Coefficient of variation (CV)2.80435751
Kurtosis11.07436429
Mean13.53767123
Median Absolute Deviation (MAD)0
Skewness3.445138948
Sum3953
Variance1441.300981
MonotonicityNot monotonic
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
0166
56.8%
163
 
21.6%
3133
 
11.3%
712
 
4.1%
1773
 
1.0%
1643
 
1.0%
1403
 
1.0%
1862
 
0.7%
1881
 
0.3%
1531
 
0.3%
Other values (5)5
 
1.7%
ValueCountFrequency (%)
0166
56.8%
163
 
21.6%
712
 
4.1%
3133
 
11.3%
1131
 
0.3%
1141
 
0.3%
1281
 
0.3%
1341
 
0.3%
1381
 
0.3%
1403
 
1.0%
ValueCountFrequency (%)
1881
 
0.3%
1862
0.7%
1773
1.0%
1643
1.0%
1531
 
0.3%
1403
1.0%
1381
 
0.3%
1341
 
0.3%
1281
 
0.3%
1141
 
0.3%

COMUNINF
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct38
Distinct (%)13.0%
Missing0
Missing (%)0.0%
Memory size17.6 KiB
229 
160027
 
5
150215
 
5
130002
 
5
130020
 
4
Other values (33)
44 

Length

Max length6
Median length0
Mean length1.294520548
Min length0

Characters and Unicode

Total characters378
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique27 ?
Unique (%)9.2%

Sample

1st row160027
2nd row
3rd row160040
4th row
5th row160027

Common Values

ValueCountFrequency (%)
229
78.4%
1600275
 
1.7%
1502155
 
1.7%
1300025
 
1.7%
1300204
 
1.4%
1100204
 
1.4%
3304604
 
1.4%
1100013
 
1.0%
1600532
 
0.7%
5100252
 
0.7%
Other values (28)29
 
9.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
1300025
 
7.9%
1600275
 
7.9%
1502155
 
7.9%
1300204
 
6.3%
1100204
 
6.3%
3304604
 
6.3%
1100013
 
4.8%
1200052
 
3.2%
5100252
 
3.2%
1600532
 
3.2%
Other values (27)27
42.9%

Most occurring characters

ValueCountFrequency (%)
0150
39.7%
177
20.4%
342
 
11.1%
238
 
10.1%
531
 
8.2%
618
 
4.8%
410
 
2.6%
76
 
1.6%
86
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number378
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0150
39.7%
177
20.4%
342
 
11.1%
238
 
10.1%
531
 
8.2%
618
 
4.8%
410
 
2.6%
76
 
1.6%
86
 
1.6%

Most occurring scripts

ValueCountFrequency (%)
Common378
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0150
39.7%
177
20.4%
342
 
11.1%
238
 
10.1%
531
 
8.2%
618
 
4.8%
410
 
2.6%
76
 
1.6%
86
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII378
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0150
39.7%
177
20.4%
342
 
11.1%
238
 
10.1%
531
 
8.2%
618
 
4.8%
410
 
2.6%
76
 
1.6%
86
 
1.6%

LOC_INF
Categorical

HIGH CARDINALITY
HIGH CORRELATION
HIGH CORRELATION

Distinct52
Distinct (%)17.8%
Missing0
Missing (%)0.0%
Memory size17.5 KiB
191 
LUAN
 
17
MOCA
 
9
PIUR
 
8
LARA
 
4
Other values (47)
63 

Length

Max length4
Median length0
Mean length1.363013699
Min length0

Characters and Unicode

Total characters398
Distinct characters25
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique38 ?
Unique (%)13.0%

Sample

1st rowSITO
2nd row
3rd rowMAZA
4th row
5th rowLARA

Common Values

ValueCountFrequency (%)
191
65.4%
LUAN17
 
5.8%
MOCA9
 
3.1%
PIUR8
 
2.7%
LARA4
 
1.4%
CANA4
 
1.4%
ALVA4
 
1.4%
ALTA3
 
1.0%
LAUN3
 
1.0%
PERU3
 
1.0%
Other values (42)46
 
15.8%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
luan17
 
16.8%
moca9
 
8.9%
piur8
 
7.9%
alva4
 
4.0%
cana4
 
4.0%
lara4
 
4.0%
alta3
 
3.0%
laun3
 
3.0%
peru3
 
3.0%
atal2
 
2.0%
Other values (41)44
43.6%

Most occurring characters

ValueCountFrequency (%)
A92
23.1%
L38
9.5%
N37
9.3%
U34
 
8.5%
R31
 
7.8%
I23
 
5.8%
O22
 
5.5%
C19
 
4.8%
P18
 
4.5%
T15
 
3.8%
Other values (15)69
17.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter396
99.5%
Other Punctuation2
 
0.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A92
23.2%
L38
9.6%
N37
9.3%
U34
 
8.6%
R31
 
7.8%
I23
 
5.8%
O22
 
5.6%
C19
 
4.8%
P18
 
4.5%
T15
 
3.8%
Other values (13)67
16.9%
Other Punctuation
ValueCountFrequency (%)
%1
50.0%
:1
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin396
99.5%
Common2
 
0.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
A92
23.2%
L38
9.6%
N37
9.3%
U34
 
8.6%
R31
 
7.8%
I23
 
5.8%
O22
 
5.6%
C19
 
4.8%
P18
 
4.5%
T15
 
3.8%
Other values (13)67
16.9%
Common
ValueCountFrequency (%)
%1
50.0%
:1
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII398
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A92
23.1%
L38
9.5%
N37
9.3%
U34
 
8.5%
R31
 
7.8%
I23
 
5.8%
O22
 
5.5%
C19
 
4.8%
P18
 
4.5%
T15
 
3.8%
Other values (15)69
17.3%

DEXAME
Categorical

HIGH CARDINALITY
UNIFORM

Distinct169
Distinct (%)57.9%
Missing0
Missing (%)0.0%
Memory size19.2 KiB
None
 
7
2012-06-19
 
6
2012-10-05
 
6
2012-05-30
 
5
2012-03-28
 
5
Other values (164)
263 

Length

Max length10
Median length10
Mean length9.856164384
Min length4

Characters and Unicode

Total characters2878
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique95 ?
Unique (%)32.5%

Sample

1st row2012-01-05
2nd row2012-01-06
3rd row2012-01-12
4th row2012-01-17
5th row2012-01-23

Common Values

ValueCountFrequency (%)
None7
 
2.4%
2012-06-196
 
2.1%
2012-10-056
 
2.1%
2012-05-305
 
1.7%
2012-03-285
 
1.7%
2012-04-035
 
1.7%
2012-07-104
 
1.4%
2012-12-184
 
1.4%
2012-02-293
 
1.0%
2012-07-063
 
1.0%
Other values (159)244
83.6%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
none7
 
2.4%
2012-10-056
 
2.1%
2012-06-196
 
2.1%
2012-04-035
 
1.7%
2012-05-305
 
1.7%
2012-03-285
 
1.7%
2012-07-104
 
1.4%
2012-12-184
 
1.4%
2012-08-273
 
1.0%
2012-07-063
 
1.0%
Other values (159)244
83.6%

Most occurring characters

ValueCountFrequency (%)
2726
25.2%
0644
22.4%
-570
19.8%
1505
17.5%
372
 
2.5%
769
 
2.4%
562
 
2.2%
660
 
2.1%
450
 
1.7%
948
 
1.7%
Other values (5)72
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2280
79.2%
Dash Punctuation570
 
19.8%
Lowercase Letter21
 
0.7%
Uppercase Letter7
 
0.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2726
31.8%
0644
28.2%
1505
22.1%
372
 
3.2%
769
 
3.0%
562
 
2.7%
660
 
2.6%
450
 
2.2%
948
 
2.1%
844
 
1.9%
Lowercase Letter
ValueCountFrequency (%)
o7
33.3%
n7
33.3%
e7
33.3%
Dash Punctuation
ValueCountFrequency (%)
-570
100.0%
Uppercase Letter
ValueCountFrequency (%)
N7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2850
99.0%
Latin28
 
1.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2726
25.5%
0644
22.6%
-570
20.0%
1505
17.7%
372
 
2.5%
769
 
2.4%
562
 
2.2%
660
 
2.1%
450
 
1.8%
948
 
1.7%
Latin
ValueCountFrequency (%)
N7
25.0%
o7
25.0%
n7
25.0%
e7
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2878
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2726
25.2%
0644
22.4%
-570
19.8%
1505
17.5%
372
 
2.5%
769
 
2.4%
562
 
2.2%
660
 
2.1%
450
 
1.7%
948
 
1.7%
Other values (5)72
 
2.5%

RESULT
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Memory size18.9 KiB
1
158 
2
61 
4
59 
 
7
5
 
5
Other values (2)
 
2

Length

Max length2
Median length1
Mean length0.9794520548
Min length0

Characters and Unicode

Total characters286
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.7%

Sample

1st row4
2nd row1
3rd row4
4th row1
5th row4

Common Values

ValueCountFrequency (%)
1158
54.1%
261
 
20.9%
459
 
20.2%
7
 
2.4%
55
 
1.7%
101
 
0.3%
81
 
0.3%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1158
55.4%
261
 
21.4%
459
 
20.7%
55
 
1.8%
101
 
0.4%
81
 
0.4%

Most occurring characters

ValueCountFrequency (%)
1159
55.6%
261
 
21.3%
459
 
20.6%
55
 
1.7%
01
 
0.3%
81
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number286
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1159
55.6%
261
 
21.3%
459
 
20.6%
55
 
1.7%
01
 
0.3%
81
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
Common286
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1159
55.6%
261
 
21.3%
459
 
20.6%
55
 
1.7%
01
 
0.3%
81
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII286
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1159
55.6%
261
 
21.3%
459
 
20.6%
55
 
1.7%
01
 
0.3%
81
 
0.3%

PMM
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct51
Distinct (%)43.6%
Missing175
Missing (%)59.9%
Infinite0
Infinite (%)0.0%
Mean48977.42735
Minimum3
Maximum5011000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.4 KiB

Quantile statistics

Minimum3
5-th percentile274
Q1380
median380
Q3480
95-th percentile10048
Maximum5011000
Range5010997
Interquartile range (IQR)100

Descriptive statistics

Standard deviation464575.1601
Coefficient of variation (CV)9.485495366
Kurtosis115.0712717
Mean48977.42735
Median Absolute Deviation (MAD)10
Skewness10.69071912
Sum5730359
Variance2.158300794 × 1011
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
38041
 
14.0%
39011
 
3.8%
3704
 
1.4%
4803
 
1.0%
4003
 
1.0%
6803
 
1.0%
3853
 
1.0%
2802
 
0.7%
7802
 
0.7%
3402
 
0.7%
Other values (41)43
 
14.7%
(Missing)175
59.9%
ValueCountFrequency (%)
31
0.3%
61
0.3%
951
0.3%
1901
0.3%
2201
0.3%
2501
0.3%
2802
0.7%
3011
0.3%
3072
0.7%
3101
0.3%
ValueCountFrequency (%)
50110001
0.3%
4400001
0.3%
1000001
0.3%
406401
0.3%
326401
0.3%
102001
0.3%
100101
0.3%
100021
0.3%
83601
0.3%
77601
0.3%

PCRUZ
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Memory size18.1 KiB
165 
3
83 
4
25 
5
 
7
1
 
6
Other values (2)
 
6

Length

Max length1
Median length0
Mean length0.4349315068
Min length0

Characters and Unicode

Total characters127
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row
3rd row3
4th row
5th row4

Common Values

ValueCountFrequency (%)
165
56.5%
383
28.4%
425
 
8.6%
57
 
2.4%
16
 
2.1%
24
 
1.4%
62
 
0.7%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
383
65.4%
425
 
19.7%
57
 
5.5%
16
 
4.7%
24
 
3.1%
62
 
1.6%

Most occurring characters

ValueCountFrequency (%)
383
65.4%
425
 
19.7%
57
 
5.5%
16
 
4.7%
24
 
3.1%
62
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number127
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
383
65.4%
425
 
19.7%
57
 
5.5%
16
 
4.7%
24
 
3.1%
62
 
1.6%

Most occurring scripts

ValueCountFrequency (%)
Common127
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
383
65.4%
425
 
19.7%
57
 
5.5%
16
 
4.7%
24
 
3.1%
62
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII127
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
383
65.4%
425
 
19.7%
57
 
5.5%
16
 
4.7%
24
 
3.1%
62
 
1.6%

TRA_ESQUEM
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct6
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Memory size18.5 KiB
165 
99
69 
1
55 
4
 
1
10
 
1

Length

Max length2
Median length0
Mean length0.6746575342
Min length0

Characters and Unicode

Total characters197
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)1.0%

Sample

1st row1
2nd row
3rd row1
4th row
5th row1

Common Values

ValueCountFrequency (%)
165
56.5%
9969
23.6%
155
 
18.8%
41
 
0.3%
101
 
0.3%
71
 
0.3%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
9969
54.3%
155
43.3%
41
 
0.8%
101
 
0.8%
71
 
0.8%

Most occurring characters

ValueCountFrequency (%)
9138
70.1%
156
28.4%
41
 
0.5%
01
 
0.5%
71
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number197
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
9138
70.1%
156
28.4%
41
 
0.5%
01
 
0.5%
71
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
Common197
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
9138
70.1%
156
28.4%
41
 
0.5%
01
 
0.5%
71
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII197
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9138
70.1%
156
28.4%
41
 
0.5%
01
 
0.5%
71
 
0.5%

DSTRAESQUE
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct19
Distinct (%)6.5%
Missing0
Missing (%)0.0%
Memory size18.7 KiB
223 
ARTESUNATO+MEFLOQUINA
26 
ARTESUNATO + MEFLOQUINA
 
14
ARTESUANTO+MEFLOQUINA
 
7
ARTESUNATO INJETAVEL
 
3
Other values (14)
 
19

Length

Max length30
Median length0
Mean length5.157534247
Min length0

Characters and Unicode

Total characters1506
Distinct characters27
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)3.8%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
223
76.4%
ARTESUNATO+MEFLOQUINA26
 
8.9%
ARTESUNATO + MEFLOQUINA14
 
4.8%
ARTESUANTO+MEFLOQUINA7
 
2.4%
ARTESUNATO INJETAVEL3
 
1.0%
CLOROQUINA E PRIMAQUINA3
 
1.0%
ARTESUNATO+ MEFLOQUINA3
 
1.0%
ARTESUANTO + MEFLOQUINA2
 
0.7%
ARTESUANTO INJETAVEL1
 
0.3%
ARTHEMETER LUMEFANTRINA1
 
0.3%
Other values (9)9
 
3.1%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
artesunato+mefloquina26
21.0%
artesunato22
17.7%
mefloquina20
16.1%
17
13.7%
artesuanto+mefloquina7
 
5.6%
injetavel5
 
4.0%
primaquina5
 
4.0%
cloroquina3
 
2.4%
e3
 
2.4%
artesuanto3
 
2.4%
Other values (13)13
10.5%

Most occurring characters

ValueCountFrequency (%)
A210
13.9%
N139
9.2%
E137
9.1%
T133
8.8%
U132
 
8.8%
O129
 
8.6%
I82
 
5.4%
R77
 
5.1%
L68
 
4.5%
Q68
 
4.5%
Other values (17)331
22.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1387
92.1%
Math Symbol58
 
3.9%
Space Separator55
 
3.7%
Decimal Number5
 
0.3%
Other Punctuation1
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A210
15.1%
N139
10.0%
E137
9.9%
T133
9.6%
U132
9.5%
O129
9.3%
I82
 
5.9%
R77
 
5.6%
L68
 
4.9%
Q68
 
4.9%
Other values (10)212
15.3%
Decimal Number
ValueCountFrequency (%)
32
40.0%
11
20.0%
01
20.0%
61
20.0%
Space Separator
ValueCountFrequency (%)
55
100.0%
Other Punctuation
ValueCountFrequency (%)
/1
100.0%
Math Symbol
ValueCountFrequency (%)
+58
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1387
92.1%
Common119
 
7.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
A210
15.1%
N139
10.0%
E137
9.9%
T133
9.6%
U132
9.5%
O129
9.3%
I82
 
5.9%
R77
 
5.6%
L68
 
4.9%
Q68
 
4.9%
Other values (10)212
15.3%
Common
ValueCountFrequency (%)
+58
48.7%
55
46.2%
32
 
1.7%
11
 
0.8%
01
 
0.8%
/1
 
0.8%
61
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII1506
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A210
13.9%
N139
9.2%
E137
9.1%
T133
8.8%
U132
 
8.8%
O129
 
8.6%
I82
 
5.4%
R77
 
5.1%
L68
 
4.5%
Q68
 
4.5%
Other values (17)331
22.0%

DTRATA
Categorical

HIGH CARDINALITY
HIGH CORRELATION
HIGH CORRELATION

Distinct98
Distinct (%)33.6%
Missing0
Missing (%)0.0%
Memory size18.3 KiB
None
165 
2012-12-24
 
3
2012-03-15
 
3
2012-06-15
 
3
2012-06-19
 
3
Other values (93)
115 

Length

Max length10
Median length4
Mean length6.609589041
Min length4

Characters and Unicode

Total characters1930
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique72 ?
Unique (%)24.7%

Sample

1st row2012-01-05
2nd rowNone
3rd row2012-01-12
4th rowNone
5th row2012-01-23

Common Values

ValueCountFrequency (%)
None165
56.5%
2012-12-243
 
1.0%
2012-03-153
 
1.0%
2012-06-153
 
1.0%
2012-06-193
 
1.0%
2012-04-093
 
1.0%
2012-07-122
 
0.7%
2012-06-012
 
0.7%
2012-05-172
 
0.7%
2012-12-302
 
0.7%
Other values (88)104
35.6%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
none165
56.5%
2012-06-153
 
1.0%
2012-03-153
 
1.0%
2012-04-093
 
1.0%
2012-12-243
 
1.0%
2012-06-193
 
1.0%
2012-07-102
 
0.7%
2012-07-162
 
0.7%
2012-05-082
 
0.7%
2012-07-122
 
0.7%
Other values (88)104
35.6%

Most occurring characters

ValueCountFrequency (%)
2328
17.0%
0291
15.1%
-254
13.2%
1207
10.7%
N165
8.5%
o165
8.5%
n165
8.5%
e165
8.5%
337
 
1.9%
533
 
1.7%
Other values (5)120
 
6.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1016
52.6%
Lowercase Letter495
25.6%
Dash Punctuation254
 
13.2%
Uppercase Letter165
 
8.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2328
32.3%
0291
28.6%
1207
20.4%
337
 
3.6%
533
 
3.2%
729
 
2.9%
627
 
2.7%
426
 
2.6%
922
 
2.2%
816
 
1.6%
Lowercase Letter
ValueCountFrequency (%)
o165
33.3%
n165
33.3%
e165
33.3%
Dash Punctuation
ValueCountFrequency (%)
-254
100.0%
Uppercase Letter
ValueCountFrequency (%)
N165
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1270
65.8%
Latin660
34.2%

Most frequent character per script

Common
ValueCountFrequency (%)
2328
25.8%
0291
22.9%
-254
20.0%
1207
16.3%
337
 
2.9%
533
 
2.6%
729
 
2.3%
627
 
2.1%
426
 
2.0%
922
 
1.7%
Latin
ValueCountFrequency (%)
N165
25.0%
o165
25.0%
n165
25.0%
e165
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1930
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2328
17.0%
0291
15.1%
-254
13.2%
1207
10.7%
N165
8.5%
o165
8.5%
n165
8.5%
e165
8.5%
337
 
1.9%
533
 
1.7%
Other values (5)120
 
6.2%

DT_ENCERRA
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing292
Missing (%)100.0%
Memory size2.4 KiB

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

TP_NOTID_AGRAVODT_NOTIFICSEM_NOTNU_ANOSG_UF_NOTID_MUNICIPID_REGIONAID_UNIDADEDT_SIN_PRISEM_PRIDT_NASCNU_IDADE_NCS_SEXOCS_GESTANTCS_RACACS_ESCOL_NSG_UFID_MN_RESIID_RG_RESIID_PAISDT_INVESTID_OCUPA_NCLASSI_FINAT_ATIVIDAAT_LAMINAAT_SINTOMATPAUTOCTOCOUFINFCOPAISINFCOMUNINFLOC_INFDEXAMERESULTPMMPCRUZTRA_ESQUEMDSTRAESQUEDTRATADT_ENCERRA
02B542012-01-0520120120123333045527083532012-01-022012011983-10-234028M6108333304901NaT810205111212AP1160027SITO2012-01-054480.0312012-01-05NaT
12B542012-01-0620120120123333045522978332012-01-062012011996-01-164015F5104333304551NaT9999912102102012-01-061NaNNoneNaT
22B542012-01-12201202201233330455652012-01-122012021959-05-284052M6404333304551NaT233105110212AP1160040MAZA2012-01-124450.0312012-01-12NaT
32B542012-01-1720120320123333045530059922012-01-032012011989-02-024022M6407333304551NaT999991242102012-01-171NaNNoneNaT
42B542012-01-2320120420123333045522883382012-01-192012031981-08-224030M6408333301701NaT21231519212AP1160027LARA2012-01-234560.0412012-01-23NaT
52B542012-01-2320120420123333045522883382012-01-182012031971-06-224040M6106333304551NaT711130110212AP1160027LARA2012-01-2347760.049910 CLOROQUINA/36 PRIMAQUINA2012-01-23NaT
62B542012-01-2520120420123333008026969242011-12-302011521971-01-194040F5108333300801NaT111311RJ13300802012-02-0243.0112012-02-02NaT
72B542012-01-2720120420123333024022765342012-01-262012041978-09-204033M6209333302401NaT31321511011231LUAN2012-01-272NaN199ARTESUNATO + MEFLOQUINA 3 DIAS2012-01-27NaT
82B542012-01-3120120520123333045530059922012-01-262012041954-04-194057M6406333304551NaT2522102102102012-01-311NaNNoneNaT
92B542012-02-0620120620123333045530059922012-02-042012051981-04-074030M6108333304551NaT999991111212AM1130120COAR2012-02-064680.0412012-02-06NaT

Last rows

TP_NOTID_AGRAVODT_NOTIFICSEM_NOTNU_ANOSG_UF_NOTID_MUNICIPID_REGIONAID_UNIDADEDT_SIN_PRISEM_PRIDT_NASCNU_IDADE_NCS_SEXOCS_GESTANTCS_RACACS_ESCOL_NSG_UFID_MN_RESIID_RG_RESIID_PAISDT_INVESTID_OCUPA_NCLASSI_FINAT_ATIVIDAAT_LAMINAAT_SINTOMATPAUTOCTOCOUFINFCOPAISINFCOMUNINFLOC_INFDEXAMERESULTPMMPCRUZTRA_ESQUEMDSTRAESQUEDTRATADT_ENCERRA
2822B542012-12-2020125120123333062022737482012-12-092012501968-01-284044M6404333303601NaT782510112112AC11200402012-12-2042920.0412012-12-24NaT
2832B542012-12-2020125120123333045522704712012-12-202012511948-03-164064M6106333304551NaT2102102012-12-201NaNNoneNaT
2842B542012-12-2020125120123333036022793982012-12-092012501968-11-284044M6333303601NaT111322RJ1330360ACRE2012-12-2045011000.0612012-12-24NaT
2852B542012-12-2420125220123333045522883382012-12-102012501968-01-284044M6109333303601NaT210212AC11200202012-12-244780.0412012-12-24NaT
2862B542012-12-2620125220123333045527083532012-12-112012501966-11-284046M6107333304551NaT110212RO11100012012-12-2643080.0412012-12-26NaT
2872B542012-12-2620125220123333045522883382012-12-192012511986-10-224026M6108333304551NaT2102102012-12-261NaNNoneNaT
2882B542012-12-2820125220123333045522704712012-12-272012521971-03-204041M6408333304521NaT2144052102102012-12-281NaNNoneNaT
2892B542012-12-2820125220123333045530487212012-12-152012501975-06-154037M6106333304551NaT3188052102102012-12-281NaNNoneNaT
2902B542012-12-30130120123333045522704712012-12-2912522011-12-274001M6110333304551NaT11021231LUAN2012-12-302380.0399ARTESUNATO+MEFLOQUINA2012-12-30NaT
2912B542012-12-30130120123333045522704712012-12-2912521964-11-204048M6108333304551NaT22210511021231LUAN2012-12-302280.0299ARTESUNATO+MEFLOQUINA2012-12-30NaT